Skip to content

Add WhatsApp import from decrypted backup (#136)#160

Open
eddowding wants to merge 15 commits intowesm:mainfrom
eddowding:feat/whatsapp-import
Open

Add WhatsApp import from decrypted backup (#136)#160
eddowding wants to merge 15 commits intowesm:mainfrom
eddowding:feat/whatsapp-import

Conversation

@eddowding
Copy link

@eddowding eddowding commented Feb 26, 2026

Summary

  • Add import --type whatsapp command for importing messages from a decrypted WhatsApp msgstore.db backup
  • Extend sender/query filters in both DuckDB and SQLite engines to support WhatsApp's sender_id path (not just email-based message_recipients)
  • Update MCP handler to route non-email from values to display name and phone number matching
  • Extend Parquet cache schema with sender_id, message_type, attachment_count, phone_number, and title columns
  • Add cache schema versioning to force rebuild on column layout changes

What's in this PR

import --type whatsapp

  • New CLI: msgvault import --type whatsapp --phone "+447700900000" /path/to/msgstore.db
  • Reads a decrypted WhatsApp msgstore.db (SQLite) as read-only
  • Maps WhatsApp schema to msgvault's existing multi-source tables
  • Batch processing (1000 messages at a time) for memory efficiency
  • Checkpoint/resume via sync_run cursor (last processed _id)
  • Progress callback with CLIProgress (same pattern as syncfull.go)
  • Optional --media-dir for copying media files to content-addressed storage
  • Optional --contacts for importing vCard contacts (display name resolution only — updates existing participants, does not create new ones)
  • Idempotent: re-running uses upsert dedup, no duplicates

What's imported

WhatsApp msgvault Notes
Text messages (type 0) messages + message_bodies Full text
Images (type 1) messages + attachments Metadata; file if --media-dir
Videos (type 2) messages + attachments Metadata only
Audio (type 3) messages + attachments Metadata
Voice notes (type 5) messages + attachments Metadata; file if --media-dir
Documents (type 13) messages + attachments Metadata
Stickers (type 90) messages + attachments Metadata; file if --media-dir
GIFs (type 4) messages + attachments Metadata
Reactions reactions Emoji reactions
Replies/quotes reply_to_message_id Via key_id lookup (scoped per-chat)
Group participants conversation_participants Role (admin/member)
Captions message_bodies.body_text From media_caption
1:1 chats conversations (direct_chat)
Group chats conversations (group_chat) With title

Skipped: system messages (type 7), calls (15/64/66), location shares (9), contacts (10), polls (99), statuses/stories (11).

Security

  • Media path traversal defense: sanitize paths from DB, reject absolute and .. prefixed paths, verify resolved path stays within --media-dir boundary
  • Streaming media processing: hash + copy via io.TeeReader (no io.ReadAll), capped at 100MB max file size
  • E.164 phone validation: reject non-numeric JID users, enforce 4–15 digit range, require + prefix in store layer
  • File permissions: 0600 for attachment files, 0750 for directories
  • Bounded memory: reply-threading map (keyIDToMsgID) cleared per-chat to prevent unbounded growth across 13k+ conversations

Query engine updates

  • DuckDB and SQLite buildFilterConditions now check both message_recipients (email path) and direct sender_id (WhatsApp/chat path) for sender filters
  • Phone number included in sender predicates for from:+447... queries
  • MatchesEmpty filters account for sender_id to avoid false positive "no sender" matches on WhatsApp messages
  • MCP from parameter: if value contains @, filters by email; if starts with +, filters by phone; otherwise filters by display name
  • Parquet cache schema extended with new columns; cache schema versioning forces full rebuild when column layout changes

Store additions

  • EnsureConversationWithType — like EnsureConversation but accepts conversationType parameter
  • EnsureParticipantByPhone — get-or-create by phone number with E.164 validation
  • UpdateParticipantDisplayNameByPhone — update-only (for contacts import, does not create participants)
  • EnsureConversationParticipant — insert-or-ignore into conversation_participants
  • UpsertReaction — insert-or-ignore into reactions table
  • UpsertMessageRawWithFormat — like UpsertMessageRaw but accepts format parameter

Design note: import --type vs import-whatsapp

The existing pattern uses import-mbox / import-emlx. This PR uses import --type whatsapp with a dispatcher, which is extensible for future sources (iMessage, Telegram, etc.) without adding new top-level commands. Happy to rename to import-whatsapp if you prefer consistency with the existing pattern.

MCP support

WhatsApp messages are fully queryable via MCP after import and cache rebuild:

# Search by sender name
list_messages(from: "Jane Smith")

# Search by phone
list_messages(from: "+447700900000")

# Full-text search
search_messages(query: "dinner plans", from: "Jane Smith")

# Conversation listing includes WhatsApp groups
list_conversations()

Test plan

  • go test ./internal/whatsapp/... — mapping and contacts unit tests
  • go test ./internal/query/... — DuckDB/SQLite engine tests (updated fixtures)
  • go test ./internal/store/... — store tests pass
  • go test ./internal/mcp/... — MCP handler tests pass
  • go vet ./... — clean
  • gosec ./... — no findings in PR files (expected false positives only)
  • Manual test: msgvault import --type whatsapp --phone "+44..." /path/to/msgstore.db --contacts /path/to/contacts.vcf
  • Manual test: msgvault build-cache --full-rebuild after import
  • Manual test: verify via msgvault search, TUI, and MCP queries

Tested with a real 1.19M message WhatsApp backup (13.5k conversations, April 2016–present). Import completes successfully, messages are searchable via TUI and MCP.

@roborev-ci
Copy link

roborev-ci bot commented Feb 26, 2026

roborev: Combined Review (44cc74f)

Verdict: The PR introduces valuable WhatsApp import
functionality but requires important fixes for path traversal vulnerabilities, memory consumption, cache schema compatibility, and storage path routing before merging.

High Severity

Path traversal / arbitrary file read via media paths

  • Files: internal/whatsapp/importer.go:414-433, 477
    -486
  • Issue: filepath.Join(mediaDir, relPath) is used directly with untrusted DB data. Absolute paths (e.g., /etc/passwd) and directory traversal sequences (../../...) are not blocked before os.Open. A crafted backup database can trick the importer
    into reading arbitrary local files outside the intended mediaDir.
  • Remediation: Sanitize relPath using filepath.Clean(), reject absolute paths, and strictly verify that the resulting joined path resides within the mediaDir boundary before opening. Fall back to using just the base filename if
    necessary.

Media copy writes to the wrong root path

  • Files: internal/whatsapp/importer.go:454, 460
  • Issue: handleMediaFile builds storageDir as attachments/<hashprefix> and then anchors writes with filepath .Dir("") (effectively the current working directory) rather than cfg.AttachmentsDir(). This stores imported attachments outside the msgvault data directory, making them unretrievable.
  • Remediation: Pass the attachments root directory into the importer options. Write to `/<hash[:2]

/and store thestorage_pathas<hash[:2]>/(without theattachments/` prefix).

Cache schema compatibility regression for existing Parquet caches

  • Files: internal/query/duckdb.go:178, 197;
    cmd/msgvault/cmd/build_cache.go:98, 138
  • Issue: DuckDB queries now require new Parquet columns (attachment_count, sender_id, message_type, phone_number, title). The incremental cache logic lacks a
    schema-version gate and won't force a rebuild when old-format files exist, causing query failures.
  • Remediation: Add cache schema versioning and force a full rebuild on mismatch, or explicitly handle mixed schemas via union_by_name and defensive column handling.

Medium Severity

Unbounded memory consumption when processing media files

  • Files: internal/whatsapp/importer.go:440-444
  • Issue: io.ReadAll(f) loads the entire media file into memory. Large or malicious WhatsApp media files (like videos) can
    cause severe memory pressure or OOM crashes.
  • Remediation: Replace io.ReadAll with streaming. Hash the file incrementally via io.Copy (to hash.Hash), stream the content to the destination, and enforce a maximum allowed media size with io.LimitedReader.

**
from:+<phone> MCP filter path is routed to email filtering**

  • Files: internal/mcp/handlers.go:327; internal/query/sqlite.go:289; internal/query/duckdb.go:701
  • Issue
    :
    MCP treats +... as filter.Sender, but the sender predicates only compare email_address, not phone_number. Filtering by WhatsApp phone senders will fail to match.
  • Remediation: Include phone_number = ? in the sender predicates (for both recipient and direct
    -sender paths), or route +... inputs to a phone-aware field.

Contact import behavior contradicts contract

  • Files: internal/whatsapp/contacts.go:19, 39
  • Issue: ImportContacts claims to update existing participants, but it unconditionally calls Ensure ParticipantByPhone. This creates new participants and artificially inflates the matched count on every call.
  • Remediation: Implement "update-if-exists" semantics for contact imports, and only increment matched when an existing participant is actually found and updated.

Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link

roborev-ci bot commented Feb 26, 2026

roborev: Combined Review (b11426d)

This PR implements the new WhatsApp import feature but introduces several medium-severity issues requiring fixes
, including an MCP context-poisoning vulnerability, potential memory exhaustion, and parsing/query regressions.

Medium Severity

  • Sender Identity Spoofing / Context Poisoning in MCP Tool
    • File: internal/mcp/handlers.go:327

Description: The condition strings.Contains(v, " @") incorrectly checks for a space before the @. Standard emails (user@example.com) fail this check and fall through to display name matching (filter.SenderName). This creates a spoofing vulnerability where an attacker can forge a display name to match
a target email and poison the LLM's context.
* Remediation: Remove the leading space: strings.Contains(v, "@").

  • Unbounded Memory Growth on Large Imports (DoS)

    • File: internal/whatsapp/importer. go:100, internal/whatsapp/importer.go:258
    • Description: keyIDToMsgID stores all imported message keys for the full run. A very large or crafted msgstore.db can force high memory usage and crash the process
      .
    • Remediation: Bound memory use by scoping this map per chat/batch, or replace it with a DB-backed lookup (indexed by source message key) combined with a deferred reply-resolution pass.
  • WhatsApp JID Parsing Error

    • File:
      internal/whatsapp/mapping.go:126
    • Description: strings.TrimSuffix(user, " @"+server) incorrectly expects a space before the @ symbol, causing it to fail at stripping standard WhatsApp JID suffixes. Additionally, the inline comment above it is mang
      led.
    • Remediation: Change " @"+server to "@" + server and correct the inline comment.
  • Regression in "Empty Sender" Filters

    • File: internal/query/duckdb.go:706, internal/ query/duckdb.go:721, internal/query/sqlite.go:292, internal/query/sqlite.go:313
    • Description: MatchesEmpty(ViewSenders) and MatchesEmpty(ViewSenderNames) only check
      message_recipients and ignore messages.sender_id. WhatsApp messages can have a valid direct sender without a from recipient row, causing them to be incorrectly treated as having an empty sender/name.
    • Remediation: Include sender_id -> participants in the empty
      checks for both query engines.
  • Database Pollution from Invalid Phone Values

    • File: internal/store/messages.go:782, internal/whatsapp/mapping.go:121, internal/whatsapp/importer.go:147
    • Description: EnsureParticipantByPhone accepts any string (including empty/invalid), and normalizePhone can return non-E.164 values from unexpected JID forms. This can insert garbage participants and pollute conversation membership/sender links.
    • Remediation: Validate
      phone numbers (strict E.164) before insertion, and skip or log invalid sender/member IDs.

Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link

roborev-ci bot commented Feb 26, 2026

roborev: Combined Review (ed2bf62)

Summary Verdict: The code demonstrates strong security practices overall, but contains several medium-severity issues in the new WhatsApp importer regarding unbounded memory growth, silent
error ignoring, and hardcoded regional settings.

Medium Severity Findings

  1. Unbounded Memory Growth in keyIDToMsgID

    • Files: [internal/whatsapp/importer.go:96](/home/roborev/repos/msgvault/internal/whatsapp/
      importer.go:96), internal/whatsapp/importer.go:261
    • Description: The keyIDToMsgID map grows for the entire import
      and is never bounded or pruned. A very large or malicious msgstore.db with many unique key_id values can drive unbounded memory growth and crash the importer (availability/DoS risk).
    • Suggested Remediation: Replace the in-memory global map with a bounded cache plus DB lookup fallback, or
      persist key mappings in a temp table keyed by source_message_id and query on demand.
  2. Incorrect Metric Tracking and Limit Enforcement

    • Files: [internal/whatsapp/importer.go:171](/home/roborev/repos/msgvault/internal/
      whatsapp/importer.go:171), internal/whatsapp/importer.go:255, [internal/whatsapp/importer.go:272](/home
      /roborev/repos/msgvault/internal/whatsapp/importer.go:272)
    • Description: summary.MessagesAdded++ and totalAdded++ happen after every successful UpsertMessage, but upsert may simply update an existing row. On re-
      imports, the output can claim many messages were "added" when none were newly inserted, and the --limit flag can trigger early based on these inflated counts.
    • Suggested Remediation: Have UpsertMessage return an inserted bool flag, increment MessagesAdded only when inserted == true , and track a separate "processed" counter for limit semantics if desired.
  3. Silent Error Dropping in Contact Import

    • File: [internal/whatsapp/contacts.go:35](/home/roborev/repos/msgvault/internal/whatsapp/contacts.go:
  • Description: In ImportContacts, errors from UpdateParticipantDisplayNameByPhone are ignored unless err == nil && updated. If database writes fail, the caller still receives success-like results.
  • Suggested Remediation: Accumulate and return the first error (or an aggregated error
    count) while continuing best-effort processing.
  1. Hardcoded Country Code for Phone Numbers

    • File: internal/whatsapp/contacts.go:150
    • **
      Description:** A hardcoded UK country code (+44) is prepended to any local phone number starting with 0. This will corrupt contact numbers for users located in other countries.
    • Suggested Remediation: Make the default regional country code configurable or pass it as a parameter in ImportOptions.
  2. Silent Error Dropping in Attachment Metadata Update

    • File: internal/whatsapp/importer.go:420
    • Description: Silently ignoring the
      error from imp.store.DB().Exec when updating attachments metadata masks potential SQL schema errors (e.g., if width or duration_ms columns do not exist).
    • Suggested Remediation: Log or return the error to ensure schema mismatches are detectable.

Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link

roborev-ci bot commented Feb 26, 2026

roborev: Combined Review (b3274f3)

The PR successfully implements the WhatsApp importer and query extensions, but requires fixes for a high-severity SQL generation bug and several medium-severity logic and data-handling issues
.

🔴 High Severity

SQL Generation Error with Sender Filters

  • Location: internal/query/sqlite.go:290 (and internal/query/sqlite.go:297)
  • Issue: When an empty sender filter (MatchesEmpty(View Senders)) and a sender name filter are used together, the LEFT JOIN for p_direct_sender is skipped by the first condition. This leads to a SQL runtime syntax error in the second condition (no such column: p_direct_sender.display_name).
  • Recommendation
    :
    Add LEFT JOIN participants p_direct_sender ON p_direct_sender.id = m.sender_id to the MatchesEmpty(ViewSenders) join block, or rewrite the sender filters to use EXISTS subqueries to avoid alias coupling.

🟡 Medium Severity

**
Reply Threads Break Across Resumes**

  • Location: internal/whatsapp/importer.go:329
  • Issue: The keyIDToMsgID map only tracks messages imported in the current memory process. If the import is resumed, quoting a message imported in a previous run
    silently drops the reply_to_message_id link.
  • Recommendation: If quoted.QuotedKeyID is not found in keyIDToMsgID, fall back to querying the database to resolve the message_id using the source_message_id.

**
Missing FTS Indexing Fallback for Direct Chats**

  • Location: internal/whatsapp/importer.go:290
  • Issue: Received messages in direct chats will have an empty sender address in the FTS index because the fallback to chat.User (which is correctly
    used during database senderID resolution) is missing in the FTS indexing block.
  • Recommendation: Add the fallback logic to FTS resolution: else if chat.GroupType == 0 && waMsg.FromMe == 0 { senderAddr = normalizePhone(chat.User, chat. Server) }.

Hardcoded UK Phone Normalization

  • Location: internal/whatsapp/contacts.go:145
  • Issue: normalizeVCardPhone automatically converts any local number starting with 0 to +44.... This UK-specific hardcoding
    will mis-normalize non-UK contacts, causing missed or incorrect participant name matches.
  • Recommendation: Remove the hardcoded UK conversion unless a country code is explicitly configured; otherwise, only accept explicit international forms (+ / 00).

Silenced Database Update Errors

  • Location: internal/ whatsapp/contacts.go:39
  • Issue: ImportContacts silently drops database update errors (UpdateParticipantDisplayNameByPhone) and still returns success, potentially hiding failed contact imports from the user.
  • Recommendation: Accumulate and return at least the first update error (or an aggregated error
    wrapper), while optionally continuing best-effort processing.

Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

@eddowding eddowding force-pushed the feat/whatsapp-import branch from b3274f3 to 600d3d8 Compare February 26, 2026 12:31
@roborev-ci
Copy link

roborev-ci bot commented Feb 26, 2026

roborev: Combined Review (600d3d8)

Verdict: The PR introduces solid WhatsApp import capabilities and secure data handling, but requires fixes for a high-severity SQL join error and a few medium-severity logical regressions.

High Severity

1. SQL execution error due to missing join

  • File: internal/query/sqlite.go (around line 293)
  • Description: When filter.MatchesEmpty(ViewSenders) is true, the p_direct_sender alias
    is not joined. If a SenderName filter is simultaneously applied, the code skips adding the missing join but appends a condition referencing p_direct_sender.display_name. This results in a SQL execution error (no such column: p_direct_sender.display_name).
  • Suggested
    Fix:
    Add LEFT JOIN participants p_direct_sender ON p_direct_sender.id = m.sender_id to the joins array inside the else if filter.MatchesEmpty(ViewSenders) block.

Medium Severity

2. Checkpoint/resume is not actually
implemented despite UI messaging

  • Files: internal/whatsapp/importer.go:101, [internal/whatsapp/importer.go:391](/home/robore
    v/repos/msgvault/internal/whatsapp/importer.go:391)
  • Description: The importer writes checkpoints (UpdateSyncCheckpoint) and prints “Saving checkpoint... Run again to continue.”, but it never reads a prior checkpoint to resume. Re-runs will rescan from the
    start.
  • Suggested Fix: Load the last checkpoint at import start and initialize the per-chat/message cursor from it (or remove the checkpoint/resume messaging if not supported yet).

3. Reply threading misses already-imported quoted messages

  • Description: reply_to_message_id is set only when the quoted key exists in the in-memory keyIDToMsgID map from the current run. If the quoted message was imported in a previous run, reply links are dropped.
  • Suggested Fix
    :
    On map miss, do a DB lookup by (source_id, source_message_id) and set reply_to_message_id when found.

4. GetGmailIDsByFilter broadened sender matching can pull non-Gmail IDs into Gmail workflows

  • Files
    :
    internal/query/sqlite.go:896, [internal/query/duckdb.go:1472](/home/roborev/repos/msgvault/internal
    /query/duckdb.go:1472)
  • Description: This function is Gmail-specific by contract, but the sender filter logic now matches phone/direct-sender rows (WhatsApp/chat). In mixed-source datasets, this can return non-Gmail source_message_id values for deletion/staging flows.
  • Suggested Fix: Explicitly scope this method to Gmail sources (source_type='gmail' or equivalent join/filter) regardless of the sender filter semantics.

Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link

roborev-ci bot commented Feb 26, 2026

roborev: Combined Review (4ef671b)

Summary Verdict: The code is secure and generally well-implemented, with two medium-severity logic issues identified in the contact import process.

Medium

  1. Silent data-path failures in contact import

    File: internal/whatsapp/contacts.go:37
    ImportContacts ignores errors from UpdateParticipantDisplayNameByPhone (err is dropped unless err == nil && updated). If DB writes fail, the command can report success with partial/zero updates and no warning.
    Suggested fix: Propagate the first error (or collect and return an aggregated error count), and surface failures in CLI output.

  2. Incorrect phone normalization can mis-assign names

    File: internal/whatsapp/contacts.go:143
    normalizeVCardPhone rewrites any local 0... number to +44....
    That is UK-specific and can incorrectly map non-UK contacts, leading to wrong participant name updates.
    Suggested fix: Only normalize unambiguous international formats (+ / 00) unless a country code is explicitly configured.


Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link

roborev-ci bot commented Feb 26, 2026

roborev: Combined Review (03abaf4)

The code is generally clean and secure with no vulnerabilities found, but there are a few medium-severity issues related to query filtering and error handling that need addressing.

Medium

Incorrect Empty Sender Filtering
Files: /home/roborev/repos/msgvault/internal/query/sqlite.go:292, /home/roborev/repos/msgvault/internal/query/duckdb.go:704
MatchEmpty (ViewSenders) checks only the sender email (plus sender_id), not the sender phone. A phone-only sender in message_recipients can be incorrectly treated as an “empty sender.”
Suggested fix: In both engines, treat the sender as non-empty when either the email or phone
_number
is present.

Hidden Partial Failures on Contact Import
File: /home/roborev/repos/msgvault/cmd/msgvault/cmd/import.go:130
Contact import DB errors are only printed as warnings, and the command still exits with a
success status. This can hide partial failures in scripted usage.
Suggested fix: Return an error when whatsapp.ImportContacts returns an error, or gate this behavior behind an explicit “best-effort” flag.


Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link

roborev-ci bot commented Feb 26, 2026

roborev: Combined Review (f11e5ff)

Verdict: The code is secure, but there are two medium-severity functional issues regarding vCard parsing and conversation metadata updates that need addressing.

Medium

  • vCard parsing misses common valid formats (line folding/encoded values), causing contact import misses

    • File: internal/whatsapp/contacts.go:59
    • Problem: parseVCardFile reads raw lines directly and does not unfold continuation lines or decode encoded
      FN values. Real-world .vcf exports often use folded lines and encoded names, so names/phones can be skipped or malformed.
    • Fix: Add RFC-compliant unfolding + decoding (or use a vCard parser lib), and add tests for folded FN/
      TEL lines and encoded names.
  • Conversation metadata is insert-only; existing rows never get title/type refresh

    • File: [internal/store/messages.go:741](/home/roborev/repos/msgvault/internal/store/messages.
      go:741)
    • Problem: EnsureConversationWithType does SELECT then INSERT, returning early when the row exists. If a group title/type changes (or was initially empty), the data stays stale.
    • Fix: Use upsert (ON CONFLICT) to
      update conversation_type, and update title when the new title is non-empty.

Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link

roborev-ci bot commented Feb 26, 2026

roborev: Combined Review (58b39e3)

Summary Verdict: One medium severity issue was identified regarding vCard parsing case-sensitivity; no security vulnerabilities or high severity issues were found.

Medium

vCard parsing is case-sensitive and can silently miss valid contacts
Files: internal/whatsapp/contacts.go:92, [internal/whatsapp/contacts.go:104](/
home/roborev/repos/msgvault/internal/whatsapp/contacts.go:104)
BEGIN:VCARD/END:VCARD, FN, and TEL are matched with exact-case prefixes. vCard field names are case-insensitive, so lowercase/mixed
-case files can be skipped or partially parsed.
Suggested fix: parse the key segment case-insensitively (e.g., uppercase only the key/params part before matching), while preserving original value bytes.


Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link

roborev-ci bot commented Feb 26, 2026

roborev: Combined Review (b4c6db8)

Review completed with 1 High and 4 Medium severity findings regarding
email/JID parsing bugs, terminal injection vulnerabilities, and vCard decoding errors.

High

  • File: internal/mcp/handlers.go:326
    Issue: Incorrect string matching for email addresses. strings.Contains(v, " @") looks for a space
    before the @ symbol, which fails to match valid email addresses and misroutes them to the SenderName filter instead of Sender.
    Remediation: Change to strings.Contains(v, "@").

Medium

  • File: cmd/msgvault/cmd/import. go:195, 205 (Also introduced in internal/tui/view.go:536, 541, 897, 904)
    Issue: Terminal control-sequence injection. Untrusted message/chat metadata (e.g
    ., WhatsApp snippet, chat titles) is written directly to terminal output, allowing crafted values containing ANSI/OSC escape sequences to manipulate the terminal display (spoofing, clipboard injection).
    Remediation: Sanitize untrusted text before rendering to the terminal/TUI by stripping ESC/C0/C
    1 control characters and normalizing newlines.

  • File: internal/whatsapp/contacts.go:69, 149
    Issue: Quoted-printable decoding is incomplete for soft line breaks. The current implementation handles =XX bytes but not QP soft wraps (= at EOL), which
    can corrupt folded encoded names common in vCards.
    Remediation: Decode using the mime/quotedprintable reader or explicitly strip =\r?\n soft breaks before hex decoding.

  • File: internal/whatsapp/contacts.go:188
    Issue: Phone normalization
    can incorrectly convert ambiguous local numbers into E.164. The code prefixes + for any 10+ digit number not starting with 0, potentially mis-normalizing local/national formats and matching the wrong participant.
    Remediation: Only accept numbers with an explicit + or 00 unless a default region/country code is explicitly configured.

  • File: internal/whatsapp/mapping.go:134
    Issue: JID suffix trimming failure. strings.TrimSuffix(user, " @"+server) assumes a space before the @ symbol,
    failing to strip the server suffix from standard JIDs (e.g., 447700900000@s.whatsapp.net).
    Remediation: Change to strings.TrimSuffix(user, "@"+server).


Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link

roborev-ci bot commented Feb 26, 2026

roborev: Combined Review (7a7f594)

Summary Verdict: The code review identified one medium-severity issue related to attachment deduplication during failed media copies.

Medium

  • File: internal/whatsapp/ importer.go (~L300) and internal/store/messages.go (UpsertAttachment, ~L650)
  • Description: If a media copy fails, handleMediaFile returns an empty contentHash and storagePath, but the import still calls Ups ertAttachment. Because UpsertAttachment deduplicates by (message_id, content_hash), multiple attachments with an empty hash will incorrectly collapse into a single record.
  • Suggested remediation: Only deduplicate when content_hash != '' (or use a fallback stable key like the WhatsApp media row
    ID or path), and update metadata by the attachment row ID instead of the hash.

Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link

roborev-ci bot commented Feb 26, 2026

roborev: Combined Review (d56499e)

Verdict: The PR implements WhatsApp import functionality with good security practices, but requires fixes for message ID collisions and string parsing errors.

Medium

  • Message ID collision / data overwrite

    • Files: internal/whatsapp/queries.go:58, internal /whatsapp/importer.go:258
    • Issue: fetchMessages coerces m.key_id to '' (COALESCE(m.key_id, '')), then the importer uses that as SourceMessageID for upserts. If any non
      -skipped messages have null/empty key_id, multiple distinct rows can collapse onto the same logical message key.
    • Remediation: Reject/skip empty key_id messages with a warning, or use a stable fallback key (e.g., chat_row_id:_id ) when key_id is empty.
  • Unexpected space in suffix trimming logic

    • File: internal/whatsapp/mapping.go:132
    • Issue: The suffix trimming logic strings.TrimSuffix(user, " @"+server) contains an
      unexpected space before the @, which will fail to strip valid JID suffixes. Additionally, the comment contains an accidental path replacement (@internal/api/server_test.go instead of @server).
    • Remediation: Remove the space to use "@" + server and fix the comment.
  • Unexpected space in email matching

    • File: internal/mcp/handlers.go:327

    • Issue: strings.Contains(v, " @") includes an unexpected space before the @, which will fail to match standard email addresses without spaces.

    • Remediation: Change to strings.Contains(v, "@").


Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link

roborev-ci bot commented Feb 26, 2026

roborev: Combined Review (b7dcbcd)

Verdict: The PR introduces solid features and security practices, but requires fixes for DSN construction, terminal injection in single-line outputs, and phone number normalization.

Medium

1. SQLite DSN String Concatenation
File: internal/whatsapp/importer.go:40
Issue: SQLite DSN is built by string concatenation (waDBPath + "?mode=ro..."). If waDBPath contains URI metacharacters (not
ably ?), DSN parsing can be altered so enforced options like mode=ro are not reliably applied, and the opened DB path can differ from the validated path.
Remediation: Build DSN with explicit URI construction/escaping instead of concatenation (for example, file: URI + escaped
path + fixed query params), and reject/normalize reserved URI characters in filesystem paths before opening.

2. Terminal Injection in Single-Line Output
Files: internal/textutil/encoding.go:183, cmd/msgvault/cmd/import.go:204 , internal/tui/view.go:533
Issue: SanitizeTerminal intentionally preserves \n and \r. In single-line render paths (progress line and list rows), that still allows output spoofing/layout breakage.
Remediation: Add a
single-line sanitizer (strip \r, \n, \t) for these UI contexts, and keep the current sanitizer for multiline-safe contexts.

3. Contact Phone Normalization
File: internal/whatsapp/contacts.go:205
Issue
:
normalizeVCardPhone keeps all digits when input starts with +, so values like +44 (0)7700... become +4407700... and won’t match participant numbers (+447700...).
**
Remediation:** Strip optional trunk markers like (0) before digit normalization for +-prefixed numbers, and enforce E.164 length bounds.


Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

Add `import --type whatsapp` command for importing messages from a
decrypted WhatsApp msgstore.db into the msgvault unified schema.

New package internal/whatsapp/:
- Reads msgstore.db as read-only SQLite
- Maps WhatsApp schema to msgvault tables (conversations, participants,
  messages, attachments, reactions, reply threading)
- Batch processing (1000 msgs/batch) with checkpoint/resume
- Optional --contacts for vCard name resolution (update-only, no creation)
- Optional --media-dir for content-addressed media storage
- Imports: text, images, video, audio, voice notes, documents,
  stickers, GIFs, reactions, replies, group participants
- Skips: system messages, calls, location shares, contacts, polls

Security:
- Media path traversal defense (sanitize, reject absolute/.. paths,
  boundary check against mediaDir)
- Streaming hash + copy for media (no io.ReadAll, 100MB max)
- E.164 phone validation (reject non-numeric JIDs, 4-15 digit range)
- File permissions 0600/0750 for attachments
- Per-chat reply map scoping to bound memory

Query engine updates:
- Sender filters in DuckDB and SQLite check both message_recipients
  (email path) and direct sender_id (WhatsApp/chat path)
- Phone number included in sender predicates for from:+447... queries
- MatchesEmpty filters account for sender_id to avoid false positives
- MCP handler routes non-email from values to display name matching
- Parquet cache extended with sender_id, message_type, attachment_count,
  phone_number, title columns
- Cache schema versioning to force rebuild on column layout changes

Store additions:
- EnsureConversationWithType (parameterized conversation_type)
- EnsureParticipantByPhone (E.164 validated, with identifier row)
- UpdateParticipantDisplayNameByPhone (update-only for contacts)
- EnsureConversationParticipant, UpsertReaction, UpsertMessageRawWithFormat

Tested with 1.19M messages (13.5k conversations, April 2016-present).
- Add missing p_direct_sender JOIN in SQLite MatchesEmpty(ViewSenders)
  path so SenderName filter doesn't reference an unjoined alias
- Add DB fallback for reply threading: when quoted message key_id is
  not in the per-chat in-memory map, look it up in the messages table
  to link replies from previous runs or other chats
- Scope GetGmailIDsByFilter to Gmail sources only (JOIN on
  source_type='gmail' in SQLite, message_type filter in DuckDB
  Parquet fallback) to prevent WhatsApp IDs leaking into
  deletion/staging workflows
- Remove misleading checkpoint PageToken (resume not implemented;
  re-runs are safe via upsert dedup)
…rmalization

- ImportContacts now counts DB errors and returns an aggregated error
  instead of silently dropping failures from UpdateParticipantDisplayNameByPhone
- Remove UK-specific 0→+44 normalization in normalizeVCardPhone; local
  numbers without country code are now skipped as ambiguous
- Only normalize unambiguous international formats (+ prefix, 00 prefix)
…s error

- MatchesEmpty(ViewSenders) now treats a sender as non-empty when either
  email_address or phone_number is present (both SQLite and DuckDB)
- Contact import errors now cause the import command to exit with failure
  instead of printing a warning and returning success
…refresh

- parseVCardFile now unfolds RFC 2425 continuation lines (leading
  space/tab) so multi-line FN and TEL values are correctly parsed
- Decode QUOTED-PRINTABLE encoded FN values (e.g., =C3=A9 → é)
- EnsureConversationWithType now updates conversation_type and title
  on existing rows when values have changed (non-empty title only)
- Added tests for folded lines, encoded names, and QP decoding
vCard field names are case-insensitive per RFC 2426. Match BEGIN/END/FN/TEL
using uppercased key portion while preserving original value bytes.
…erminal injection

- Handle QUOTED-PRINTABLE soft line breaks (= at EOL) during vCard
  parsing by joining continuation lines before property extraction
- Tighten phone normalization to only accept numbers with explicit
  country code indicators (+ or 00 prefix), avoiding false matches
- Add SanitizeTerminal() to strip ANSI escape sequences and control
  characters from untrusted metadata (chat names, snippets) before
  rendering to terminal/TUI
- Add tests for all three fixes
Databases created before the WhatsApp feature have no phone_number,
sender_id, message_type, attachment_count, or title columns because
CREATE TABLE IF NOT EXISTS is a no-op for existing tables.

This breaks the MCP server and cache builder with:
  Binder Error: Column "phone_number" in REPLACE list not found

Add ALTER TABLE migrations in InitSchema() for all v2 columns.
Silently ignores "duplicate column name" errors for databases that
already have the columns.
Existing v2 caches may have been built before the schema migration
added phone_number to the participants table, resulting in Parquet
files without the column. Bumping to v3 ensures build-cache detects
the mismatch and triggers a full rebuild automatically.
@eddowding
Copy link
Author

Post-rebase test failures on v0.9.0

After rebasing feat/whatsapp-import onto current main (v0.9.0), all 5 TestBuildCache_* tests fail with:

Binder Error: Table "m" does not have a column named "attachment_count"

Root cause: The test schemas in cmd/msgvault/cmd/build_cache_test.go don't include the three columns added by this branch (attachment_count, sender_id, message_type). There are 4 CREATE TABLE messages definitions in the test file (lines ~43, ~1131, ~1331, ~1410) that all need updating.

Fix: Add to each test schema:

attachment_count INTEGER DEFAULT 0,
sender_id INTEGER,
message_type TEXT NOT NULL DEFAULT 'email',

Build passes clean, and all other test packages pass. Just these 5 tests in cmd/msgvault/cmd.

@wesm
Copy link
Owner

wesm commented Feb 27, 2026

From Claude

"The test failures are a legitimate bug in this branch, not a rebase artifact. The branch adds three new
columns to the messages table in the production schema (internal/store/schema.sql):

  1. attachment_count (line 143)
  2. sender_id (line 122)
  3. message_type (line 112)

The build cache query in cmd/msgvault/cmd/build_cache.go (lines 249-252) now references all three columns:

COALESCE(TRY_CAST(m.attachment_count AS INTEGER), 0) as attachment_count,
m.sender_id,
COALESCE(TRY_CAST(m.message_type AS VARCHAR), '') as message_type,

But the test helper setupTestSQLite() in build_cache_test.go (line 43) still creates the messages table
without these columns. The test schema was never updated to match the production schema changes made in this
branch's own commits.

This is the contributor's responsibility to fix — the branch modified the production schema and the build
cache queries but didn't update the test schemas. The fix is straightforward: add the three missing columns to
the test's CREATE TABLE messages definition. There's one setupTestSQLite() function (line 43) that all the
failing tests share, so a single edit should fix all 5 tests.

The CI on main passes because main doesn't have these columns in the query — they were introduced by this
branch."

Users upgrading from pre-WhatsApp msgvault have stale Parquet cache
files that lack newer columns (phone_number, attachment_count,
sender_id, message_type, title). The SELECT * REPLACE(...) syntax
hard-fails with a binder error when a named column doesn't exist.

Probe actual Parquet schema at engine init time via DESCRIBE SELECT *.
Conditionally build CTEs to REPLACE existing columns or ADD missing
ones with sensible defaults. Log a warning suggesting build-cache
--full-rebuild.

Also fixes all 5 build_cache_test.go schemas that were missing the
new columns, unblocking the test suite.
- Strip "(0)" trunk prefix before digit extraction in normalizeVCardPhone.
  Common in UK/European numbers: +44 (0)7700 means +447700, not +4407700.
- Use file: URI for WhatsApp SQLite DSN to safely handle paths containing
  '?' or other special characters.
@eddowding eddowding force-pushed the feat/whatsapp-import branch from b7dcbcd to 1862cfa Compare February 27, 2026 10:32
@eddowding
Copy link
Author

Fixes pushed

Two commits addressing the test failures and a related runtime issue:

1. Graceful handling of missing Parquet columns (47c7a64)

  • Users upgrading from pre-WhatsApp msgvault have stale Parquet cache files missing the new columns (phone_number, attachment_count, sender_id, message_type, title)
  • SELECT * REPLACE(...) hard-fails with a binder error when a column doesn't exist
  • Fix: probe the actual Parquet schema at engine init via DESCRIBE SELECT *, then conditionally build CTEs — REPLACE columns that exist, ADD missing ones with sensible defaults
  • Logs a warning suggesting build-cache --full-rebuild
  • Added TestDuckDBEngine_StaleParquetSchema to verify graceful degradation
  • Also fixes all 5 build_cache_test.go schemas that were missing the new columns — all 15 TestBuildCache_* tests pass now

2. WhatsApp bug fixes (1862cfa)

  • Strip (0) trunk prefix from vCard phone numbers before digit extraction (common in UK/European contacts: +44 (0)7700+447700)
  • Use file: URI for WhatsApp SQLite DSN to safely handle paths containing ? or other special characters

Full test suite green: go test -tags fts5 ./...

@roborev-ci
Copy link

roborev-ci bot commented Feb 27, 2026

roborev: Combined Review (1862cfa)

Verdict: The codebase requires a few medium-severity fixes for terminal injection on error paths and message ID collisions, but is otherwise robustly implemented.

Medium

  1. Terminal injection on error paths

    • Files
      :
      import.go#L221, [importer.go#L133](/home/roborev/repos/msgvault/internal/whatsapp/importer.
      go#L133)
    • Issue: Terminal injection is still possible on error paths. ImportCLIProgress.OnError prints err directly to the terminal, while several importer errors embed untrusted values from WhatsApp DB content (for example sourceConvID / JID-derived strings
      ). A crafted backup can include control sequences and manipulate terminal output (spoofing, OSC clipboard attacks, etc.).
    • Remediation: Sanitize all terminal-bound error text before printing (for example textutil.SanitizeTerminal(err.Error()) in OnError), and avoid formatting
      raw DB strings directly into human-facing errors unless sanitized first.
  2. Possible message ID collision/data overwrite when key_id is missing

    • Files: [queries.go:58](/home/roborev/repos/msgvault/internal/whatsapp/queries.
      go:58), mapping.go:55, [importer.go:274](/home/roborev/repos/msgvault/internal/whatsapp/importer.go
      :274)
    • Issue: fetchMessages coerces m.key_id to '', and that value is used as source_message_id for upsert. If any importable rows have null/empty key_id, multiple messages can collapse onto
      the same logical ID.
    • Remediation: Keep key_id nullable, and either skip rows with empty key IDs (with warning) or derive a stable fallback ID (for example wa_rowid:<rowid>).

Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

- Sanitize error messages via textutil.SanitizeTerminal before printing
  in OnError, preventing terminal injection from crafted WhatsApp backups
  containing control sequences in JIDs or conversation IDs.
- Skip messages with empty key_id during import — they can't be uniquely
  identified for upsert and would collide/overwrite each other.
@roborev-ci
Copy link

roborev-ci bot commented Feb 27, 2026

roborev: Combined Review (d8f4dfd)

Summary Verdict: This PR requires updates to address a high-severity terminal injection bypass and a medium-severity database schema migration bug.

High

  • Terminal Injection Bypass via UTF-8
    C1 Control Characters
    • File: internal/textutil/encoding.go, lines 196-209
    • Description: The SanitizeTerminal function aims to prevent terminal injection by stripping C0/C1 control characters. However, the check if c < 0x20 || (c >= 0x7f && c <= 0x9f) evaluates only the first raw byte (c) of a character. In UTF-8, C1 control characters (such as the Control Sequence Introducer CSI, U+009B) are encoded using
      two bytes (e.g., 0xC2 0x9B). Because the first byte 0xC2 is greater than 0x9F, the initial byte check is bypassed and the character is decoded and appended unchanged. An attacker can exploit this by sending a WhatsApp message containing UTF-8 encoded C1
      control characters to execute arbitrary ANSI terminal escape sequences.
    • Suggested Remediation: Move the control character validation so it applies to the decoded rune r instead of the raw initial byte:
      r, size := utf8.DecodeRuneInString(s[i:])
      
      if r == utf8.RuneError && size == 1 {
      	i++
      	continue
      }
      
      // Allow tab, newline, carriage return; strip other C0/C1 control chars.
      if (r < 0x20 && r != '\

t' && r != '\n' && r != '\r') || (r >= 0x7f && r <= 0x9f) {
i += size
continue
}

b.WriteString(s[i : i+size])
i +=

size
```

Medium

  • Legacy DBs fail in build-cache before migrations run
    • Files: [cmd/msgvault/cmd/build_cache.go:87](/home/roborev/repos/msgvault/cmd/msgvault/cmd
      /build_cache.go:87), cmd/msgvault/cmd/build_cache.go:613
    • Description: build-cache now selects new columns (attachment_count, sender_id, message_type, phone_number, title) but the command path itself does not run InitSchema() migrations first. On an older database, this will fail with a no such column error when building the cache.

Suggested Remediation: Run schema migration at the start of build-cache (e.g., open store + InitSchema()), or make export queries column-aware (PRAGMA table_info + defaults).


Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

… everywhere

SanitizeTerminal now decodes full UTF-8 runes before checking for C1
control characters (U+0080–U+009F). Previously the check operated on
raw bytes, so multi-byte C1 chars like U+009B (CSI, encoded as 0xC2
0x9B) bypassed the filter — an attacker could inject arbitrary ANSI
escape sequences via WhatsApp chat names or message content.

Added InitSchema() calls to 10 commands that were missing it:
build-cache, show-message, list-senders, list-domains, list-labels,
export-attachments, export-eml, repair-encoding, verify, and
update-account. Without migrations, legacy databases lacking new
columns (attachment_count, sender_id, message_type, phone_number,
title) would fail with "no such column" errors.
@roborev-ci
Copy link

roborev-ci bot commented Feb 27, 2026

roborev: Combined Review (38688a7)

Synthesis unavailable. Showing raw review outputs.

Agent: codex | Type: security | Status: done
  1. Mediuminternal/textutil/encoding.go:205
    SanitizeTerminal currently preserves carriage return (\r). Untrusted strings (chat names, errors, participant metadata) are passed through this sanitizer and then printed to terminal. \r can still rewrite the current line and spoof/hide output, which is a terminal-injection/log-forging vector even after ANSI stripping.
    Remediation: Strip \r for untrusted text (keep only \n/\t if needed), or escape it to a visible literal (\\r) before printing.

  2. Lowinternal/whatsapp/contacts.go:129, internal/whatsapp/contacts.go:43
    vCard FN values are stored as participant display_name without control-character validation. A malicious .vcf can persist control characters into the DB, creating stored terminal-injection risk in any output path that does not sanitize on render.
    Remediation: Validate/sanitize contact names on import (remove C0/C1 controls, enforce max length), in addition to output-side sanitization.

Agent: gemini | Type: security | Status: failed

Error: Review failed. Check CI logs for details.

Agent: codex | Type: default | Status: done
  1. Brief summary

These commits add a new WhatsApp import pipeline (msgvault import --type whatsapp), extend schema/query/cache handling for chat-specific fields (phone_number, sender_id, message_type, attachment_count, conversations.title), add stale-Parquet compatibility in DuckDB, run schema migrations more broadly from CLI commands, and add terminal-output sanitization plus vCard contact import support.

  1. Issues found

  2. Mediuminternal/whatsapp/importer.go (around summary.MessagesAdded++ / totalAdded++ in the main message loop)
    MessagesAdded is incremented unconditionally after UpsertMessage, even when the row already exists (upsert update path). This makes the summary inaccurate on reruns and can make --limit stop early relative to truly new imports.
    Suggested fix: have UpsertMessage return (id, inserted bool) (or equivalent), increment MessagesAdded only when inserted == true, and use a separate counter for limit semantics if limit is intended to cap processed rows.

  3. Mediuminternal/store/messages.go (inside EnsureConversationWithType, update path)
    The two UPDATE conversations ... calls ignore returned errors (_, _ = s.db.Exec(...)). If the DB is locked or fails writes, caller gets success and stale conversation metadata persists silently.
    Suggested fix: check and return update errors, e.g. if _, err := s.db.Exec(...); err != nil { return 0, fmt.Errorf(...) }.

  4. Lowinternal/store/messages.go (inside EnsureParticipantByPhone, existing-participant branch)
    For existing participants found by phone_number, the function returns early and does not ensure participant_identifiers row exists, despite the function contract/comment saying it also creates one. This can leave legacy rows partially migrated for identifier-based lookups.
    Suggested fix: in the existing-row path, run the same INSERT OR IGNORE INTO participant_identifiers (...) before returning.

  5. Low (testing gap)internal/whatsapp/importer.go
    There is good unit coverage for mapping/contact parsing, but no importer-level test covering rerun/idempotency and summary correctness (processed/added/skipped) with pre-existing messages.
    Suggested fix: add an integration-style importer test that imports the same dataset twice and asserts MessagesAdded behavior and --limit semantics.

Agent: gemini | Type: default | Status: done

No review output generated

- Add union_by_name=true to read_parquet() in parquetCTEs() to handle
  schema drift across partition files (old partitions lack new columns)
- Add direct_sender CTE to SearchFast, SearchFastCount, and
  SearchFastWithStats materialization, matching ListMessages pattern
  for WhatsApp messages that use sender_id instead of message_recipients
- Update buildSearchConditions() sender/from/to/recipient filters to
  check phone_number and sender_id alongside email_address
- Add from_phone to all SearchFast SELECT/Scan paths including the
  cached temp table pagination
- Update test expectation for new from-filter EXISTS pattern
@roborev-ci
Copy link

roborev-ci bot commented Feb 27, 2026

roborev: Combined Review (ef0f471)

Review Verdict: 1 Medium severity issue identified; lower severity findings have been omitted per instructions.

Medium

DuckDB search paths do not populate message_type / conversation title fields now used by TUI formatting.

  • Files: internal/query/duckdb.go:1816
    , [internal/query/duckdb.go:2179](/home/roborev/repos/msgvault/internal/query/duckdb.go:217
    9), internal/query/duckdb.go:2034, consumer at [internal/tui/view.go:542](/home/roborev/
    repos/msgvault/internal/tui/view.go:542)
  • Problem: SearchFast / SearchFastWithStats / cached page scans add FromPhone but still don’t select/scan msg.message_type and conv.title,
    while the TUI now relies on those for WhatsApp subject/title fallback. Search results can render blank/less-informative rows for chat messages.
  • Suggested Fix: Include COALESCE(msg.message_type,'') AS message_type and COALESCE(c.title,'') AS conv _title in the DuckDB search SELECT statements and scan them into MessageSummary.

Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

…pp import

Three bugs found during real-world import of 1.19M messages:

1. Groups with group_type=0 but @g.us server (Communities/sub-groups)
   were misclassified as direct chats — 617 groups affected. Extract
   isGroupChat() helper and use it consistently in importer.

2. ~50% of messages in newer groups have "lid" senders that resolved to
   NULL. Add fetchLidMap() to read jid_map table and resolveLidSender()
   fallback for message senders, reaction senders, and FTS indexing.

3. Newer groups have empty group_participants table. Add participant
   fallback that tracks resolved senders per chat and ensures each is
   registered as a conversation participant after the message loop.

Also updates denormalised conversation counts (message_count,
participant_count, last_message_at) after the full import.
@roborev-ci
Copy link

roborev-ci bot commented Mar 3, 2026

roborev: Combined Review (ece0486)

Review complete: Identified
2 Medium severity issues regarding DuckDB query scoping and terminal sanitization in the new WhatsApp import functionality.

Medium

Gmail scoping in DuckDB delete/staging path is heuristic and can include non-Gmail rows
File: [internal/query/duckdb.go#L1592](/home
/roborev/repos/msgvault/internal/query/duckdb.go#L1592)

GetGmailIDsByFilter now scopes by message_type ('', 'email', NULL) instead of true source type. If cache data is stale/malformed (or
message_type missing/empty), non-Gmail messages can leak into this Gmail-specific workflow.
Suggested fix: Include source_type in cache (sources parquet) and filter by source_type='gmail', or filter by known Gmail source_ids from SQLite.

Terminal sanitizer still allows carriage return/newline control effects
File: internal/textutil/encoding.go#L146
SanitizeTerminal strips ANSI
/C0/C1 but preserves \r and \n. This function is used in single-line contexts (progress/status/TUI fields), where \r can still rewrite lines and \n can break layout/log integrity.
Suggested fix: Add a strict
single-line sanitizer (strip \r/\n) for progress/TUI list fields, or make the default sanitizer strip \r at minimum.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants